Skip to content

feat: use nh3 for HTML sanitization#10264

Open
nouralmaa wants to merge 30 commits intoietf-tools:mainfrom
nouralmaa:replace-bleach-nh3
Open

feat: use nh3 for HTML sanitization#10264
nouralmaa wants to merge 30 commits intoietf-tools:mainfrom
nouralmaa:replace-bleach-nh3

Conversation

@nouralmaa
Copy link
Contributor

@nouralmaa nouralmaa commented Jan 15, 2026

fixes #10138

# Allow the protocols/tags/attributes we specifically want, plus anything that nh3 declares
# to be safe.

acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO check. this is more permissive than bleach's strict superset. however, unsure if this adds extra vulnerabilities since nh3 uses a better maintained parser https://github.com/rust-ammonia/ammonia

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try to get a few other eyes on this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding tel: and ftp: (really?) are fine from a security perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need ftp, but tel might be useful.

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.36%. Comparing base (4945809) to head (3d66b11).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
ietf/utils/markdown.py 89.36% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10264   +/-   ##
=======================================
  Coverage   88.36%   88.36%           
=======================================
  Files         325      325           
  Lines       43653    43652    -1     
=======================================
  Hits        38573    38573           
+ Misses       5080     5079    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

"li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike", "style",
"li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike",
"strong", "sub", "sup", "table", "title", "tbody", "td", "tfoot", "th", "thead",
"tr", "tt", "u", "ul", "var"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xmp is also good

Suggested change
"tr", "tt", "u", "ul", "var"
"tr", "tt", "u", "ul", "var", "xmp"

# to be safe.

acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union(
{"http", "https", "mailto", "ftp", "xmpp"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"http", "https", "mailto", "ftp", "xmpp"}
{"ftp", "http", "https", "mailto", "tel", "xmpp"}

Sort. Add "tel".

(If performance is dictated by order, move "https" to the top.)

protocols=acceptable_protocols,
strip=True,
url_schemes=acceptable_protocols,
link_rel=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. The default value is safer. Are there cases where links opened will need window.opener? I can't imagine that being necessary for user-generated content.

Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, changed it here as there were no rel attributes before

protocols=acceptable_protocols,
strip=True,
_liberal_nh3_cleaner = nh3.Cleaner(
tags=acceptable_tags.union({"mg", "figure", "figcaption"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
tags=acceptable_tags.union({"mg", "figure", "figcaption"}),
tags=acceptable_tags.union({"img", "figure", "figcaption"}),

"""Returns the given HTML sanitized, and with the given tags removed."""
allowed = acceptable_tags - set(t.lower() for t in tags)
return bleach.clean(html, tags=allowed, strip=True)
return nh3.clean(html, tags=allowed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to audit invocations of this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @martinthomson did so himself, but it looks to me like the only use is via the removetags filter in htmlfilters.py, which is not used anywhere (I also checked the DBTemplates)

If so, we can just lose this method entirely.

If not, I think this changes remove_... to escape_... because nh3 doesn't have a strip option (but that's just based on reading docs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to raise any validation errors

Jason-123-cyber

This comment was marked as spam.

@rjsparks
Copy link
Member

rjsparks commented Jan 30, 2026

Hi @nouralmaa - Thanks for continuing to work on this.

One point on something that's not obvious - commits bf72278 and 9736b74 touch an old data migration (see the filename) - those will never run again, and its better to not touch these migration files as they should show what happened when they actually ran on the production database. If it's easy, please revert those commits. If it's not, we'll do it when this gets towards final review.

When we have major changes to django (we may have one coming with django5) we tend to squash the migrations and that one would go away at that time.

'<a href="https://www.ietf.org" rel="nofollow">https://www.ietf.org</a>',
)
self.assertEqual(
linkify("https://mailman3.ietf.org/mailman3/lists/[email protected]/"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes #10120

Image

@nouralmaa nouralmaa marked this pull request as ready for review January 30, 2026 19:37
value = conditional_escape(value)
text = mark_safe(_linkify(value)) # _linkify is a safe operation
value = urlize(value, autoescape=True) # _linkify is a safe operation
text = mark_safe(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work as written - it needs to call urlize() in all cases, not only when autoescape is true. However,

  1. The commit log around ietf.doc.templatetags.ietf_filters.urlize mentions issues with Django's urlize and its handling of adjacent parentheses. Need to investigate that and whether it's been fixed.
  2. If it is ok to switch to urlize, we should just do away with linkify as it's redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have adjusted so this can be called in both cases.

  1. re the handling of adjacent parentheses and this commit, I can understand why django's urlizer might not be ideal for linkification like [REF](http://example.com/foo), and the issue still stands. however, I think the markdown filter inside ietf.utils.templatetags.htmlfilters can be used in such cases with links intact
  2. yes, but it would require changing lots of internal html templates and tests to do away with the linkify filter completely. maybe can be raised in a separate issue? can also try to have a look soon.


def test_view_status_update(self):
chair = RoleFactory(name_id='chair',group__type_id='wg')
event = GroupEventFactory(type='status_update',group=chair.group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected change. Unless there's a reason not to test this behavior, need to fix the assertion below rather than drop the test.

"""Returns the given HTML sanitized, and with the given tags removed."""
allowed = acceptable_tags - set(t.lower() for t in tags)
return bleach.clean(html, tags=allowed, strip=True)
return nh3.clean(html, tags=allowed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @martinthomson did so himself, but it looks to me like the only use is via the removetags filter in htmlfilters.py, which is not used anywhere (I also checked the DBTemplates)

If so, we can just lose this method entirely.

If not, I think this changes remove_... to escape_... because nh3 doesn't have a strip option (but that's just based on reading docs)

@nouralmaa
Copy link
Contributor Author

Hi @nouralmaa - Thanks for continuing to work on this.

One point on something that's not obvious - commits bf72278 and 9736b74 touch an old data migration (see the filename) - those will never run again, and its better to not touch these migration files as they should show what happened when they actually ran on the production database. If it's easy, please revert those commits. If it's not, we'll do it when this gets towards final review.

When we have major changes to django (we may have one coming with django5) we tend to squash the migrations and that one would go away at that time.

re these commits, I've reverted the changes and also made a change to use html instead of having the links hardcoded into the slugs for that template. I imagine that may be necessary down the road in any case, but feel free to revert if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace datatracker's use of bleach

6 participants